Build an Agentic Voice AI That Understands, Plans, and Speaks Autonomously
'Tutorial shows how to assemble a real-time voice AI agent that transcribes, reasons, plans and speaks using Whisper and SpeechT5.'
Records found: 7
'Tutorial shows how to assemble a real-time voice AI agent that transcribes, reasons, plans and speaks using Whisper and SpeechT5.'
Microsoft AI Lab released MAI-Voice-1 for fast, high-fidelity speech generation and MAI-1-preview, a homegrown foundation language model optimized for conversational tasks and gradual product integration
'Microsoft released VibeVoice 1.5B, an open source TTS model that generates up to 90 minutes of expressive audio with up to four speakers and supports cross lingual and singing synthesis.'
Kyutai has launched a groundbreaking streaming TTS model with 2 billion parameters, achieving 220ms latency and trained on 2.5 million hours of speech. This open-source model supports multiple users and real-time applications, advancing speech AI technology.
Discover how AI voices have evolved from robotic tones to natural, human-like speech, transforming fields like accessibility, entertainment, and customer support.
Chinese researchers release LLaMA-Omni2, a modular speech language model that enables real-time spoken dialogue with minimal latency and strong performance using compact training data.
VERSA is a new, versatile evaluation toolkit integrating 65 metrics for speech, audio, and music assessment, offering unprecedented flexibility and standardization in generative audio evaluation.